Phrasetable Smoothing for Statistical Machine Translation

نویسندگان

  • George F. Foster
  • Roland Kuhn
  • Howard Johnson
چکیده

We discuss different strategies for smoothing the phrasetable in Statistical MT, and give results over a range of translation settings. We show that any type of smoothing is a better idea than the relativefrequency estimates that are often used. The best smoothing techniques yield consistent gains of approximately 1% (absolute) according to the BLEU metric.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Translation Quality by Discarding Most of the Phrasetable

It is possible to reduce the bulk of phrasetables for Statistical Machine Translation using a technique based on the significance testing of phrase pair co-occurrence in the parallel corpus. The savings can be quite substantial (up to 90%) and cause no reduction in BLEU score. In some cases, an improvement in BLEU is obtained at the same time although the effect is less pronounced if state-of-t...

متن کامل

Decision Trees for Lexical Smoothing in Statistical Machine Translation

We present a method for incorporating arbitrary context-informed word attributes into statistical machine translation by clustering attribute-quali ed source words, and smoothing their word translation probabilities using binary decision trees. We describe two ways in which the decision trees are used in machine translation: by using the attribute-quali ed source word clusters directly, or by u...

متن کامل

Tuning Statistical Machine Translation Parameters

Word alignment is the basis of statistical machine translation. GIZA++ is a popular tool for producing word alignments and translation models. It uses a set of parameters that affect the quality of word alignments and translation models. These parameters exist to overcome some problems such as overfitting. This paper addresses the problem of tuning GIZA++ parameter for better translation qualit...

متن کامل

Performance Analysis of Different Smoothing Methods on n-grams for Statistical Machine Translation

Smoothing techniques adjust the maximum likelihood estimate of probabilities to produce more accurate probabilities. This is one of the most important tasks while building a language model with a limited number of training data. Our main contribution of this paper is to analyze the performance of different smoothing techniques on n-grams. Here we considered three most widely-used smoothing algo...

متن کامل

A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU

BLEU is the de facto standard machine translation (MT) evaluation metric. However, because BLEU computes a geometric mean of n-gram precisions, it often correlates poorly with human judgment on the sentence-level. Therefore, several smoothing techniques have been proposed. This paper systematically compares 7 smoothing techniques for sentence-level BLEU. Three of them are first proposed in this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006